Quantifying Word Order Freedom in Dependency Corpora
نویسندگان
چکیده
Using recently available dependency corpora, we present novel measures of a key quantitative property of language, word order freedom: the extent to which word order in a sentence is free to vary while conveying the same meaning. We discuss two topics. First, we discuss linguistic and statistical issues associated with our measures and with the annotation styles of available corpora. We find that we can measure reliable upper bounds on word order freedom in head direction and the ordering of certain sisters, but that more general measures of word order freedom are not currently feasible. Second, we present results of our measures in 34 languages and demonstrate a correlation between quantitative word order freedom of subjects and objects and the presence of nominative-accusative case marking. To our knowledge this is the first large-scale quantitative test of the hypothesis that languages with more word order freedom have more case marking (Sapir, 1921; Kiparsky, 1997).
منابع مشابه
Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek
One easily observable aspect of language variation is the order of words. In human and machine natural language processing, it is often claimed that parsing freeorder languages is more difficult than parsing fixed-order languages. In this study on Latin and Ancient Greek, two wellknown and well-documented free-order languages, we propose syntactic correlates of word order freedom. We apply our ...
متن کاملتأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملThe Induction and Evaluation of Word Order Rules using Corpora based on the Two Concepts of Topological Models
Using dependency trees in natural language generation and machine translation raise the need to derive the word order from dependency trees. This task is difficult for languages with (partly) free word order and comparatively easier for languages with fixed word order. This paper describe (a) the two basic elements of topological models, (b) rule patterns for the mapping of dependency trees to ...
متن کاملSearching for a Measure of Word Order Freedom
This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance,...
متن کاملDependency Analyser Configurable by Measures
In this paper we present a dependency analyser able to compute syntax recognition and analysis according to dependency grammars. Analyser is able to deal with nonprojective constructions, it has means to express the level of word-order freedom and its limitations. Level of word-order freedom and level of robustness (correctness) of sentences can be given as parameters of the analysis. Data and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015